SlideShare a Scribd company logo
1 of 32
What Did They Do? Deriving High-Level Edit Histories in Wikis Peter Kin-Fong Fong and  Robert P. Biuk-Aghai Data Analytics and Collaborative Computing Group Department of Computer and Information Science Faculty of Science and Technology University of Macau
Motivation ,[object Object]
But: ,[object Object]
Who provides significant edits? ,[object Object],[object Object],[object Object]
Try to answer:
What Did They Do? 1: en - de (..450,000) 2: es - it (..70,000) 3: ja - pt (..50,000) 4: vo - nl (..35000) 5: zh - sv (..22500) 6: hu - tr (..20,000) 7: ca - no (..15000) http://stats.wikimedia.org/EN/ PlotsPngDatabaseEdits.htm
Revision History
Looking at Byte Count? 2,070 bytes (new) – 2,055 bytes (old) = 15 bytes changed But: 56 bytes were cut 71 bytes were added 56 + 71 = 127 bytes changed
Looking at Minor vs. Non-Minor Edit? -> Minor edit flag: may not be available; subjective 53 bytes changed Minor 1 byte changed Non-Minor
Related Work
Related Work: Text Differencing ,[object Object],[object Object]
Does not take movement into account
Basis of Wikipedia “Diff” function Alpha  Bravo  Charlie Delta  Echo Alpha Charlie Delta  Golf Bravo Alpha Bravo Charlie Delta Echo Alpha Charlie Delta  Golf Bravo
Related Work: Text Differencing ,[object Object],[object Object]
Movement can be detected
Basis of Wikitrust Alpha Charlie Delta  Golf  Bravo Alpha  Bravo   Charlie Delta  Echo Alpha Charlie Delta  Golf Bravo Alpha Bravo Charlie Delta Echo
Text Differencing Granularity ,[object Object],[object Object]
Hierarchical decomposition ,[object Object],[object Object],[object Object],[object Object]
Always do word level diff when paragraph is changed ,[object Object]
A hard-to-read Wikipedia Diff
Related Work: Edit Categorization ,[object Object],[object Object]
Requires its student editors to categorize their own edits ,[object Object],[object Object],[object Object]
Classify each of 3,665 edit by bare eye: “simple but tedious”
Categories ,[object Object]
Self-defined: Challenged, Unchallenged
Problems ,[object Object]
Rate the significance of an edit ,[object Object]
Proposed Design
Wiki Edit History Analyzer
Step 1: Lexical Analyzer ,[object Object],[object Object]
Step 2: Text Differencing Engine ,[object Object],[object Object]
Token (word & markup) level ,[object Object],[object Object],[object Object],[object Object]

More Related Content

What's hot (7)

BDACA1617s2 - Tutorial 1
BDACA1617s2 - Tutorial 1BDACA1617s2 - Tutorial 1
BDACA1617s2 - Tutorial 1
 
Question Answering - Application and Challenges
Question Answering - Application and ChallengesQuestion Answering - Application and Challenges
Question Answering - Application and Challenges
 
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
BERT: Pre-training of Deep Bidirectional Transformers for Language UnderstandingBERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
 
Survey On Building A Database Driven Reverse Dictionary
Survey On Building A Database Driven Reverse DictionarySurvey On Building A Database Driven Reverse Dictionary
Survey On Building A Database Driven Reverse Dictionary
 
Asp.net main
Asp.net mainAsp.net main
Asp.net main
 
Tomáš Mikolov - Distributed Representations for NLP
Tomáš Mikolov - Distributed Representations for NLPTomáš Mikolov - Distributed Representations for NLP
Tomáš Mikolov - Distributed Representations for NLP
 
BD-ACA week2
BD-ACA week2BD-ACA week2
BD-ACA week2
 

Viewers also liked

Ppt test
Ppt testPpt test
Ppt testkmburow
 
Two Cups Of Coffee
Two Cups Of CoffeeTwo Cups Of Coffee
Two Cups Of Coffeeguest701fc3
 
#noOps?? Transcendendo o DevOps pelo caminho de tijolos (não muito) amarelos
#noOps?? Transcendendo o DevOps pelo caminho de tijolos (não muito) amarelos#noOps?? Transcendendo o DevOps pelo caminho de tijolos (não muito) amarelos
#noOps?? Transcendendo o DevOps pelo caminho de tijolos (não muito) amarelosEvandro Silvestre
 
95 Theses VERTSOL
95 Theses VERTSOL95 Theses VERTSOL
95 Theses VERTSOLguestc203a4
 

Viewers also liked (6)

A First Look At Drupal
A First Look At DrupalA First Look At Drupal
A First Look At Drupal
 
Ppt test
Ppt testPpt test
Ppt test
 
Tatar
TatarTatar
Tatar
 
Two Cups Of Coffee
Two Cups Of CoffeeTwo Cups Of Coffee
Two Cups Of Coffee
 
#noOps?? Transcendendo o DevOps pelo caminho de tijolos (não muito) amarelos
#noOps?? Transcendendo o DevOps pelo caminho de tijolos (não muito) amarelos#noOps?? Transcendendo o DevOps pelo caminho de tijolos (não muito) amarelos
#noOps?? Transcendendo o DevOps pelo caminho de tijolos (não muito) amarelos
 
95 Theses VERTSOL
95 Theses VERTSOL95 Theses VERTSOL
95 Theses VERTSOL
 

Similar to What Did They Do? Deriving High-Level Edit Histories in Wikis

Encoding and Designing for the Swift Poems Project
Encoding and Designing for the Swift Poems ProjectEncoding and Designing for the Swift Poems Project
Encoding and Designing for the Swift Poems ProjectJames Griffin
 
Natural language processing and transformer models
Natural language processing and transformer modelsNatural language processing and transformer models
Natural language processing and transformer modelsDing Li
 
Doc format.
Doc format.Doc format.
Doc format.butest
 
2013 06-24 Wf4Ever: Annotating research objects (PDF)
2013 06-24 Wf4Ever: Annotating research objects (PDF)2013 06-24 Wf4Ever: Annotating research objects (PDF)
2013 06-24 Wf4Ever: Annotating research objects (PDF)Stian Soiland-Reyes
 
2013 06-24 Wf4Ever: Annotating research objects (PPTX)
2013 06-24 Wf4Ever: Annotating research objects (PPTX)2013 06-24 Wf4Ever: Annotating research objects (PPTX)
2013 06-24 Wf4Ever: Annotating research objects (PPTX)Stian Soiland-Reyes
 
A Model-Based Approach to Language Integration
A Model-Based Approach to Language Integration A Model-Based Approach to Language Integration
A Model-Based Approach to Language Integration Marco Torchiano
 
International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)IJERD Editor
 
Atlanta MLconf Machine Learning Conference 09-23-2016
Atlanta MLconf Machine Learning Conference 09-23-2016Atlanta MLconf Machine Learning Conference 09-23-2016
Atlanta MLconf Machine Learning Conference 09-23-2016Chris Fregly
 
Chris Fregly, Research Scientist, PipelineIO at MLconf ATL 2016
Chris Fregly, Research Scientist, PipelineIO at MLconf ATL 2016Chris Fregly, Research Scientist, PipelineIO at MLconf ATL 2016
Chris Fregly, Research Scientist, PipelineIO at MLconf ATL 2016MLconf
 
Effective Extraction of Thematically Grouped Key Terms From Text
Effective Extraction of Thematically Grouped Key Terms From TextEffective Extraction of Thematically Grouped Key Terms From Text
Effective Extraction of Thematically Grouped Key Terms From Textmaria.grineva
 
Word2Vec model to generate synonyms on the fly in Apache Lucene.pdf
Word2Vec model to generate synonyms on the fly in Apache Lucene.pdfWord2Vec model to generate synonyms on the fly in Apache Lucene.pdf
Word2Vec model to generate synonyms on the fly in Apache Lucene.pdfSease
 
Protocol oriented programming_talk_ppt
Protocol oriented programming_talk_pptProtocol oriented programming_talk_ppt
Protocol oriented programming_talk_pptJayant Jaiswal
 
short_talk
short_talkshort_talk
short_talk卓 陈
 
Understanding Natural Languange with Corpora-based Generation of Dependency G...
Understanding Natural Languange with Corpora-based Generation of Dependency G...Understanding Natural Languange with Corpora-based Generation of Dependency G...
Understanding Natural Languange with Corpora-based Generation of Dependency G...Edmond Lepedus
 

Similar to What Did They Do? Deriving High-Level Edit Histories in Wikis (20)

Lecture1
Lecture1Lecture1
Lecture1
 
Encoding and Designing for the Swift Poems Project
Encoding and Designing for the Swift Poems ProjectEncoding and Designing for the Swift Poems Project
Encoding and Designing for the Swift Poems Project
 
Natural language processing and transformer models
Natural language processing and transformer modelsNatural language processing and transformer models
Natural language processing and transformer models
 
Doc format.
Doc format.Doc format.
Doc format.
 
210 ober ssp 2007
210 ober ssp 2007210 ober ssp 2007
210 ober ssp 2007
 
2013 06-24 Wf4Ever: Annotating research objects (PDF)
2013 06-24 Wf4Ever: Annotating research objects (PDF)2013 06-24 Wf4Ever: Annotating research objects (PDF)
2013 06-24 Wf4Ever: Annotating research objects (PDF)
 
2013 06-24 Wf4Ever: Annotating research objects (PPTX)
2013 06-24 Wf4Ever: Annotating research objects (PPTX)2013 06-24 Wf4Ever: Annotating research objects (PPTX)
2013 06-24 Wf4Ever: Annotating research objects (PPTX)
 
A Model-Based Approach to Language Integration
A Model-Based Approach to Language Integration A Model-Based Approach to Language Integration
A Model-Based Approach to Language Integration
 
20100427 Earthster Core Ontology
20100427 Earthster Core Ontology20100427 Earthster Core Ontology
20100427 Earthster Core Ontology
 
A-Study_TopicModeling
A-Study_TopicModelingA-Study_TopicModeling
A-Study_TopicModeling
 
International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)
 
Atlanta MLconf Machine Learning Conference 09-23-2016
Atlanta MLconf Machine Learning Conference 09-23-2016Atlanta MLconf Machine Learning Conference 09-23-2016
Atlanta MLconf Machine Learning Conference 09-23-2016
 
Chris Fregly, Research Scientist, PipelineIO at MLconf ATL 2016
Chris Fregly, Research Scientist, PipelineIO at MLconf ATL 2016Chris Fregly, Research Scientist, PipelineIO at MLconf ATL 2016
Chris Fregly, Research Scientist, PipelineIO at MLconf ATL 2016
 
Effective Extraction of Thematically Grouped Key Terms From Text
Effective Extraction of Thematically Grouped Key Terms From TextEffective Extraction of Thematically Grouped Key Terms From Text
Effective Extraction of Thematically Grouped Key Terms From Text
 
Ontology
OntologyOntology
Ontology
 
Word2Vec model to generate synonyms on the fly in Apache Lucene.pdf
Word2Vec model to generate synonyms on the fly in Apache Lucene.pdfWord2Vec model to generate synonyms on the fly in Apache Lucene.pdf
Word2Vec model to generate synonyms on the fly in Apache Lucene.pdf
 
Protocol oriented programming_talk_ppt
Protocol oriented programming_talk_pptProtocol oriented programming_talk_ppt
Protocol oriented programming_talk_ppt
 
short_talk
short_talkshort_talk
short_talk
 
Understanding Natural Languange with Corpora-based Generation of Dependency G...
Understanding Natural Languange with Corpora-based Generation of Dependency G...Understanding Natural Languange with Corpora-based Generation of Dependency G...
Understanding Natural Languange with Corpora-based Generation of Dependency G...
 
Icsm07.ppt
Icsm07.pptIcsm07.ppt
Icsm07.ppt
 

Recently uploaded

Top 10 CodeIgniter Development Companies
Top 10 CodeIgniter Development CompaniesTop 10 CodeIgniter Development Companies
Top 10 CodeIgniter Development CompaniesTopCSSGallery
 
How to Check CNIC Information Online with Pakdata cf
How to Check CNIC Information Online with Pakdata cfHow to Check CNIC Information Online with Pakdata cf
How to Check CNIC Information Online with Pakdata cfdanishmna97
 
WebRTC and SIP not just audio and video @ OpenSIPS 2024
WebRTC and SIP not just audio and video @ OpenSIPS 2024WebRTC and SIP not just audio and video @ OpenSIPS 2024
WebRTC and SIP not just audio and video @ OpenSIPS 2024Lorenzo Miniero
 
AI in Action: Real World Use Cases by Anitaraj
AI in Action: Real World Use Cases by AnitarajAI in Action: Real World Use Cases by Anitaraj
AI in Action: Real World Use Cases by AnitarajAnitaRaj43
 
UiPath manufacturing technology benefits and AI overview
UiPath manufacturing technology benefits and AI overviewUiPath manufacturing technology benefits and AI overview
UiPath manufacturing technology benefits and AI overviewDianaGray10
 
Frisco Automating Purchase Orders with MuleSoft IDP- May 10th, 2024.pptx.pdf
Frisco Automating Purchase Orders with MuleSoft IDP- May 10th, 2024.pptx.pdfFrisco Automating Purchase Orders with MuleSoft IDP- May 10th, 2024.pptx.pdf
Frisco Automating Purchase Orders with MuleSoft IDP- May 10th, 2024.pptx.pdfAnubhavMangla3
 
How we scaled to 80K users by doing nothing!.pdf
How we scaled to 80K users by doing nothing!.pdfHow we scaled to 80K users by doing nothing!.pdf
How we scaled to 80K users by doing nothing!.pdfSrushith Repakula
 
AI mind or machine power point presentation
AI mind or machine power point presentationAI mind or machine power point presentation
AI mind or machine power point presentationyogeshlabana357357
 
CORS (Kitworks Team Study 양다윗 발표자료 240510)
CORS (Kitworks Team Study 양다윗 발표자료 240510)CORS (Kitworks Team Study 양다윗 발표자료 240510)
CORS (Kitworks Team Study 양다윗 발표자료 240510)Wonjun Hwang
 
Design and Development of a Provenance Capture Platform for Data Science
Design and Development of a Provenance Capture Platform for Data ScienceDesign and Development of a Provenance Capture Platform for Data Science
Design and Development of a Provenance Capture Platform for Data SciencePaolo Missier
 
Introduction to use of FHIR Documents in ABDM
Introduction to use of FHIR Documents in ABDMIntroduction to use of FHIR Documents in ABDM
Introduction to use of FHIR Documents in ABDMKumar Satyam
 
Simplifying Mobile A11y Presentation.pptx
Simplifying Mobile A11y Presentation.pptxSimplifying Mobile A11y Presentation.pptx
Simplifying Mobile A11y Presentation.pptxMarkSteadman7
 
Generative AI Use Cases and Applications.pdf
Generative AI Use Cases and Applications.pdfGenerative AI Use Cases and Applications.pdf
Generative AI Use Cases and Applications.pdfalexjohnson7307
 
Harnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptx
Harnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptxHarnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptx
Harnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptxFIDO Alliance
 
How to Check GPS Location with a Live Tracker in Pakistan
How to Check GPS Location with a Live Tracker in PakistanHow to Check GPS Location with a Live Tracker in Pakistan
How to Check GPS Location with a Live Tracker in Pakistandanishmna97
 
Microsoft BitLocker Bypass Attack Method.pdf
Microsoft BitLocker Bypass Attack Method.pdfMicrosoft BitLocker Bypass Attack Method.pdf
Microsoft BitLocker Bypass Attack Method.pdfOverkill Security
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontologyjohnbeverley2021
 
Microsoft CSP Briefing Pre-Engagement - Questionnaire
Microsoft CSP Briefing Pre-Engagement - QuestionnaireMicrosoft CSP Briefing Pre-Engagement - Questionnaire
Microsoft CSP Briefing Pre-Engagement - QuestionnaireExakis Nelite
 
Working together SRE & Platform Engineering
Working together SRE & Platform EngineeringWorking together SRE & Platform Engineering
Working together SRE & Platform EngineeringMarcus Vechiato
 
State of the Smart Building Startup Landscape 2024!
State of the Smart Building Startup Landscape 2024!State of the Smart Building Startup Landscape 2024!
State of the Smart Building Startup Landscape 2024!Memoori
 

Recently uploaded (20)

Top 10 CodeIgniter Development Companies
Top 10 CodeIgniter Development CompaniesTop 10 CodeIgniter Development Companies
Top 10 CodeIgniter Development Companies
 
How to Check CNIC Information Online with Pakdata cf
How to Check CNIC Information Online with Pakdata cfHow to Check CNIC Information Online with Pakdata cf
How to Check CNIC Information Online with Pakdata cf
 
WebRTC and SIP not just audio and video @ OpenSIPS 2024
WebRTC and SIP not just audio and video @ OpenSIPS 2024WebRTC and SIP not just audio and video @ OpenSIPS 2024
WebRTC and SIP not just audio and video @ OpenSIPS 2024
 
AI in Action: Real World Use Cases by Anitaraj
AI in Action: Real World Use Cases by AnitarajAI in Action: Real World Use Cases by Anitaraj
AI in Action: Real World Use Cases by Anitaraj
 
UiPath manufacturing technology benefits and AI overview
UiPath manufacturing technology benefits and AI overviewUiPath manufacturing technology benefits and AI overview
UiPath manufacturing technology benefits and AI overview
 
Frisco Automating Purchase Orders with MuleSoft IDP- May 10th, 2024.pptx.pdf
Frisco Automating Purchase Orders with MuleSoft IDP- May 10th, 2024.pptx.pdfFrisco Automating Purchase Orders with MuleSoft IDP- May 10th, 2024.pptx.pdf
Frisco Automating Purchase Orders with MuleSoft IDP- May 10th, 2024.pptx.pdf
 
How we scaled to 80K users by doing nothing!.pdf
How we scaled to 80K users by doing nothing!.pdfHow we scaled to 80K users by doing nothing!.pdf
How we scaled to 80K users by doing nothing!.pdf
 
AI mind or machine power point presentation
AI mind or machine power point presentationAI mind or machine power point presentation
AI mind or machine power point presentation
 
CORS (Kitworks Team Study 양다윗 발표자료 240510)
CORS (Kitworks Team Study 양다윗 발표자료 240510)CORS (Kitworks Team Study 양다윗 발표자료 240510)
CORS (Kitworks Team Study 양다윗 발표자료 240510)
 
Design and Development of a Provenance Capture Platform for Data Science
Design and Development of a Provenance Capture Platform for Data ScienceDesign and Development of a Provenance Capture Platform for Data Science
Design and Development of a Provenance Capture Platform for Data Science
 
Introduction to use of FHIR Documents in ABDM
Introduction to use of FHIR Documents in ABDMIntroduction to use of FHIR Documents in ABDM
Introduction to use of FHIR Documents in ABDM
 
Simplifying Mobile A11y Presentation.pptx
Simplifying Mobile A11y Presentation.pptxSimplifying Mobile A11y Presentation.pptx
Simplifying Mobile A11y Presentation.pptx
 
Generative AI Use Cases and Applications.pdf
Generative AI Use Cases and Applications.pdfGenerative AI Use Cases and Applications.pdf
Generative AI Use Cases and Applications.pdf
 
Harnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptx
Harnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptxHarnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptx
Harnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptx
 
How to Check GPS Location with a Live Tracker in Pakistan
How to Check GPS Location with a Live Tracker in PakistanHow to Check GPS Location with a Live Tracker in Pakistan
How to Check GPS Location with a Live Tracker in Pakistan
 
Microsoft BitLocker Bypass Attack Method.pdf
Microsoft BitLocker Bypass Attack Method.pdfMicrosoft BitLocker Bypass Attack Method.pdf
Microsoft BitLocker Bypass Attack Method.pdf
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontology
 
Microsoft CSP Briefing Pre-Engagement - Questionnaire
Microsoft CSP Briefing Pre-Engagement - QuestionnaireMicrosoft CSP Briefing Pre-Engagement - Questionnaire
Microsoft CSP Briefing Pre-Engagement - Questionnaire
 
Working together SRE & Platform Engineering
Working together SRE & Platform EngineeringWorking together SRE & Platform Engineering
Working together SRE & Platform Engineering
 
State of the Smart Building Startup Landscape 2024!
State of the Smart Building Startup Landscape 2024!State of the Smart Building Startup Landscape 2024!
State of the Smart Building Startup Landscape 2024!
 

What Did They Do? Deriving High-Level Edit Histories in Wikis